Aligning sequences with repetitive motifs
نویسندگان
چکیده
Pairwise sequence alignment is among the most intensively studied problems in computational biology. We present a method for alignment of two sequences containing repetitive motifs. This is motivated by biological studies of proteins with zinc finger domain, an important group of regulatory proteins. Due to their evolutionary history, sequences of these proteins contain a variable number of different zinc fingers (short subsequences with specific symbols at each position). Our algorithm uses two types of hidden Markov models (HMM): pair HMMs and profile HMMs. Profile HMMs describe the structure of sequence motifs. Pair HMMs assign a probability to alignment of two motifs. Combination of the these two types of models yields an algorithm that uses different score when aligning conserved vs. variable motif residues. The dynamic programming algorithm that computes the motif alignments is based on the well known Viterbi algorithm. We evaluated our model on sequences of zinc finger proteins and compared it with existing alternatives.
منابع مشابه
Analysis of repetitive amino acid motifs reveals the essential features of spider dragline silk proteins
The extraordinary mechanical properties of spider dragline silk are dependent on the highly repetitive sequences of the component proteins, major ampullate spidroin 1 and 2 (MaSp2 and MaSp2). MaSp sequences are dominated by repetitive modules composed of short amino acid motifs; however, the patterns of motif conservation through evolution and their relevance to silk characteristics are not wel...
متن کاملSmall, repetitive DNAs contribute significantly to the expanded mitochondrial genome of cucumber.
Closely related cucurbit species possess eightfold differences in the sizes of their mitochondrial genomes. We cloned mitochondrial DNA (mtDNA) fragments showing strong hybridization signals to cucumber mtDNA and little or no signal to watermelon mtDNA. The cucumber mtDNA clones carried short (30-53 bp), repetitive DNA motifs that were often degenerate, overlapping, and showed no homology to an...
متن کاملA study of the repetitive structure and distribution of short motifs in human genomic sequences
Over the last several years the search for functional genomic elements by exploiting motif over-representation became increasingly popular. However, about half of the human genome is repetitive, and that is also the case with most higher eukaryotes. In this study we have shown that in addition to these known repeats, human sequences feature many short over-represented motifs, and that their fre...
متن کاملFunctional motifs in Escherichia coli NC101
Escherichia coli (E. coli) bacteria can damage DNA of the gut lining cells and may encourage the development of colon cancer according to recent reports. Genetic switches are specific sequence motifs and many of them are drug targets. It is interesting to know motifs and their location in sequences. At the present study, Gibbs sampler algorithm was used in order to predict and find functional m...
متن کاملThe roles of EPIYA sequence to perturb the cellular signaling pathways and cancer risk
Abstract It was shown that several pathogenic bacterial effector proteins contain the Glu-Pro-Ile-Tyr-Ala (EPIYA) or a similar sequence. These bacterial EPIYA effectors are delivered into host cell via type III or IV secretion system, where they undergo tyrosine phosphorylation at the EPIYA sequences, which triggers interaction with multiple host cell SH2 domain-containing proteins and thereby...
متن کامل